Journey to Bristol: my first ever Data Study Group hosted by the Turing

Hyesop
2019-08-15

Journey to Bristol: my first ever Data Study Group hosted by the Turing

A week ago, I attended a turing data study group that was held in Bristol. The Turing committe said that it was their first time hosting a study group outside London, and the University of Bristol had taken this opportunity.

On Monday morning, we saw a group of pinked shirts who had been our greatest helpers throughout the whole period. After the welcome talk was given, the six challenge PIs introduced their research, data, and agendas for the particpants to tackle for the rest of the week. As my phd project is about air pollution exposure, I chose the city council’s challenge, Get Bristol Moving - Tackling air pollution in Bristol City Centre - to test who well I understood pollution during my degree, but also to apply my knowledge to another city that wasn’t familiar at all!

What was I doing?

I met 8 participants and a facilitator to work on this challenge. Each of us had different experience and skills, so we allocated our group into four sub-projects (air quality, transport, weather, and various modelling), of which I was grouped in the air quality project.

Data collection

The air quality dataset of Bristol was available in the Bristol Opendata Portal, which was achived and published by the city council. We collected live and historic air quality continous data (1998-2019) from 16 monitoring stations.

Bristol has been monitored nitro oxides (NOx, that include NO and NO2) from a single site since January 1998, and has expanded it to multiple stations around the city centre. This might be because the NOx level is time-variant, meaning that rush hour congestion can cause negative health outcomes in a long-term (Zhang and Batterman, 2013).

At present, the city has 7 Stations (location with bold text) that monitors NOx, NO2, NO, and PM10 (see Figure 1 or ctrl + click). Each station has its own intention of monitoring: Colston avenue (detects wose case exposre in city centre), AURN St Pauls (detects background exposure in residential area), Brislington depot (Freight), Fishponds road (residential and shopping), Parson street school (residential area + school zone), Wells road (continous traffic in and out of city centre).

[Table 1. General information of pollution monitoring sites in Bristol. The seven stations (with bolded text) are the currently measuring pollution data]

Location Long Lat Date_Start Date_End Total period (for terminated stations)
AQ Mesh Temple Way -2.56208 51.47528 2019-01-01 2019-04-23 approx. 3 months
AURN St Pauls -2.55996 51.44175 2006-06-15 2019-08-05
Bath Road -2.60496 51.43267 2005-10-29 2013-01-04 approx. 7 years
Brislington Depot -2.59626 51.45543 2001-01-01 2019-08-05
Cheltenham Road  Station Road -2.58448 51.44888 2008-06-25 2011-01-01 approx 2.5 years
Colston Avenue -2.56374 51.42786 2018-03-11 2019-08-05
Fishponds Road -2.68878 51.48999 2009-03-13 2019-08-05
IKEA M32 -2.56272 51.45779 1998-01-10 2000-12-06 approx 3 years
Newfoundland Road Police Station -2.58225 51.46067 2005-01-01 2015-12-31 10 years
Parson Street School -2.57138 51.44254 2002-01-02 2019-08-05
Rupert Street -2.58454 51.46283 2003-01-01 2015-12-31 approx. 12 years
Shiner’s Garage -2.59273 51.46894 2004-06-24 2013-01-04 approx 8.5 years
Temple Meads Station -2.53523 51.47804 2003-02-01 2003-10-27 approx 9 months
Temple Way -2.58399 51.45795 2017-04-01 2019-08-05
Trailer Portway P&R -2.59665 51.45527 2004-03-01 2009-03-01 approx 5 years
Wells Road A37 Airport Road Junction -2.58399 51.45795 2003-05-23 2019-08-05



Data Pre-processing

File download and rename columns

My colleague is a data scientist in a retail consultancy, and one of this jobs was to understand the huge chunk of profile from customers. From his experience, he thought the first thing to do was data cleaning and renaming columns. He downloaded the files named Air Quality Data Continuous.csv using tidyverse, then joined local weather data gathered from a station just outside Bristol city centre. He finally saved the huge data frame with his favourite, feather package, which is compatible with R and Python.

And guess what? It just took me 2 seconds to load a 1.1 million rows across 41 variables!! This was why it was called feather!


Sys.setlocale("LC_ALL","English")
library(tidyverse)
library(tidyquant)
library(leaflet)
library(feather)

aq <- read_feather("air_and_weather.feather") %>% 
        arrange(datetime)

Take a glance at the raw data as below! You can see there are over 40 different variables measure from different stations by an hourly basis. Unfortunately, most of the variables were missing, but luckily our main pollutants nitro oxides (NOx, NO2, NO) were monitored.


# A tibble: 1,146,616 x 20
   `Date Time`           NOx   NO2    NO  PM10 SiteID NVPM10 VPM10 NVPM2.5 PM2.5 VPM2.5    CO    O3   SO2 Temperature RH    Location geo_point_2d
   <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <lgl>       <lgl> <chr>    <chr>       
 1 2017-11-27 15:00:00  23.5  16.3  4.65   7.1    452    5.8   1.3     3.8   5.6    1.8    NA  64.8    NA NA          NA    AURN St… 51.46282815…
 2 2017-11-27 18:00:00  23.7  18.9  3.12   3.7    452    3     0.7     1.9   3.8    1.9    NA  62.3    NA NA          NA    AURN St… 51.46282815…
 3 2017-11-27 22:00:00  26.7  23.1  2.34  12.6    452   10     2.6     3.5   3.9    0.4    NA  48.7    NA NA          NA    AURN St… 51.46282815…
 4 2017-11-28 06:00:00  26.7  22.5  2.76   6      452    6     0       2.5   3.1    0.6    NA  50.9    NA NA          NA    AURN St… 51.46282815…
 5 2017-11-28 10:00:00  40.5  26.8  8.92   9.7    452    9.3   0.4     5.1   5.5    0.4    NA  46.7    NA NA          NA    AURN St… 51.46282815…
 6 2017-11-28 20:00:00  63.8  48.7  9.88  14.9    452   15.2  -0.3    10.6  10.7    0.1    NA  20.6    NA NA          NA    AURN St… 51.46282815…
 7 2017-11-28 21:00:00  50.0  39.2  7.05  10.1    452    8.6   1.5     5.2   4.4   -0.8    NA  27.8    NA NA          NA    AURN St… 51.46282815…
 8 2017-11-29 05:00:00  19.0  17.1  1.26   6.6    452    6.7  -0.1     2.4   2.2   -0.2    NA  46.4    NA NA          NA    AURN St… 51.46282815…
 9 2018-04-23 10:00:00  18.9  12.5  4.16  13.5    452    9.1   4.4     2.6   6.1    3.5    NA  87.1    NA NA          NA    AURN St… 51.46282815…
10 2018-04-23 15:00:00  14.7  10.2  2.95   7.9    452    5.4   2.5     2.3   3.5    1.2    NA  87.1    NA NA          NA    AURN St… 51.46282815…
# … with 1,146,606 more rows, and 2 more variables: DateStart <dttm>, DateEnd <dttm>

Map locations

Having had a look at the data, I wanted to explore the locations of each station. In R, you can use unique to filter out the unnecessay data and leave the stations’ info.


# Check the location_names
aq %>% 
    select(location_name, lon, lat) %>% 
    unique() -> location_names_alltime

# Overall location_names 16
location_names_alltime %>% 
  leaflet() %>% 
  addTiles() %>%
  addMarkers(~lon, ~lat, label = ~as.character(location_name))  

As you can see, 10 stations are within 2 miles from the city centre, while the other 6 were situated outside. The members from the city council mentioned that station installation and removal is subjected to the contracts from AQ instrument companies. Despite a lack of stations, the local government installed over 100 diffusion tubes across the whole city.

Location of Bristol air pollution monitoring sites
Location of Bristol air pollution monitoring sites



Results

Temporal Exploration of NO2 in Bristol City

Temporal change of NO2 in Bristol Stations. You can roughly identify the measured periods fo each station
Temporal change of NO2 in Bristol Stations. You can roughly identify the measured periods fo each station

[Table Mean NO2 and counts of exceedance of 200µg/m3 by each station]

Station Name Mean NO2 Number of Exceedance
Trailer Portway P&R 24.4 43,824
Brislington Depot 26.3 162,971
AURN St Pauls 28.3 115,139
Cheltenham Rd 34.5 22,071
Bath Road 38.7 62,990
Temple Way 40.4 20,553
Fishponds Road 41.1 90,945
Shiner’s Garage 42 74,787
Wells Rd 44 142,001
Parson Street School 47.7 154,189
Newfoundland Rd 54.4 96,407
IKEA M32 61.2 25,464
Temple Meads Station 63.2 6,446
Colston Avenue 65.7 12,284
Rupert Street 93.1 113,951

Monthly aggreated boxplots by hours

Monthly aggregated NO2 distributed by hours
Monthly aggregated NO2 distributed by hours

Monthly aggregated boxplots by days of week after 2018

NO2 by days of week (2018-2019)
NO2 by days of week (2018-2019)

Summary and Future Work

What did I learn?

Needless to say, I learned to work as a team. Understanding each of our perspectives and ideas is important, however, we need to be mindful of our words and behaviours when we have to tackle a common goal. Sometimes one can be a leader of the project (because they know this theme better or they are good at allocating jobs), while the other can make this experience as to learn how to code, or use the time to discuss things together. If one wants to patrionise their knowledge without any justification, the team would end up in a disaster.

Secondly, Bristol’s air pollution is problematic however can be alleviated by understanding various urban aspects. Here, urban aspects can mean traffic signals, urban form, or road width. Bristol is a very hilly city and is rich of traffic signals. We know that traffic signals is meant to control traffic flow and protect pedestrians, but we didn’t consider how acceleration and deceleration can generate harmful pollutants. Moreover, if a vehicle stops in the middle of a hill and starts again, more pollutants are emitted and affect pedestrians. I woudn’t conclude saying we should carve all the hills, but it is worthwhile to understand why pollution is more concentrated in such areas in Bristol.